The file site_id_provinces was constructed as follows:
Grid cells in grid_final that are not in site_id_provinces:
## Joining, by = "EEA_Gridcell"
All grid cells in site_id_provinces are also in grid_final:
## Joining, by = "EEA_Gridcell"
## # A tibble: 0 x 3
## # ... with 3 variables: EEA_Gridcell <chr>, gewest <chr>, province <chr>
The file grid_files_FINAL contains cells along the border (that are not in site_id_provinces) that have most of their area outside Belgium (zoom in to the border to see this):
However, for the species observations we used a mapping from UTM1 Belgium to the EEA reference grid. The UTM1 Belgium file contained UTM1 grid cells that overlapped Belgium (blue squares = UTM1, black squares is mapping to EEA reference grid).
The file site_id_provinces.csv was thus misleading and should not be used.
Some small differences remain (see below map, zoom in to see discrepancies), but we made sure that the final observation records only contained reference grid cells that are in grid_files_FINAL.
Green squares, not filled: UTM1 Belgium grid
Black squares, not filled: squares that we selected based on mapping from UTM1 centroid to EEA grid squares.
Black and filled squares: not in our mapping from UTM1 to EEA grid because for any point within these squares, the translation from the point to the UTM1 centroid to the EEA grid always results in one of the surrounding EEA grid cells being selected.
Blue and filled squares: not in final grid because outside Belgium. The translation can result in EEA grid cells being completely outside Belgium, in which case they were removed in the final grid. This is no longer an issue in the updated observation files because EEA grid cells that were not in grid_file_FINAL were removed.